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Abstract 

This  article  discusses  the  use  of  a  symmetric  multiplicative  interaction  effect  to  capture  cer¬ 
tain  types  of  third-order  dependence  patterns  often  present  in  social  networks  and  other  dyadic 
datasets.  Such  an  effect,  along  with  standard  linear  fixed  and  random  effects,  is  incorporated  into 
a  generalized  linear  model,  and  a  Markov  chain  Monte  Carlo  algorithm  is  provided  for  Bayesian 
estimation  and  inference.  In  an  example  analysis  of  international  relations  data,  accounting  for 
such  patterns  improves  model  fit  and  predictive  performance. 
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1  Introduction 

Dyadic  data  consist  of  measurements  that  are  made  on  pairs  of  objects  or  under  a  pair  of  conditions, 
so  that  yij  denotes  the  value  of  the  (possibly  directed)  measurement  from  i  to  j.  Examples  include 
social  network  analysis,  “round  robin”  experiments  in  psychology,  and  comparative  data  in  which 
yij  might  be  a  measure  of  similarity  between  units  i  and  j.  In  the  social  networks  literature, 
modeling  has  focused  on  the  binary  case  where  yij  is  either  zero  or  one,  indicating  the  presence  or 
absence  of  a  “link”  from  i  to  j.  This  has  led  to  the  development  of  data  analysis  tools  based  on 
directed  graphs  and  the  study  of  exponentially  parameterized  random  graph  models  (Wasserman 
and  Pattison  1996).  For  valued  (non-binary)  dyadic  datasets,  a  perceived  lack  of  statistical  tools 
has  sometimes  led  to  ad-hoc  reductions  of  valued  responses  to  binary  data.  However,  AN OVA 
methods  are  available  for  valued  dyadic  data:  the  so-called  social  relations  model  (Warner,  Kenny, 
and  Stoto  1979;  Wong  1982)  allows  for  the  decomposition  of  the  variance  into  sender  and  receiver 
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specific  effects,  as  well  as  allows  for  correlation  between  responses  within  a  dyad.  Such  a  model 
has  been  studied  in  the  context  of  a  linear  group  symmetry  model  by  Li  (2002),  and  advances  in 
variance  component  analysis  have  been  made  by  and  Gill  and  Swartz  (2001)  and  Li  and  Loken 
(2002).  These  models  generally  presume  normally  distributed  data  and  additive  effects,  and  thus 
the  lack  of  any  sort  of  dependence  beyond  those  specified  by  second-order  moments.  In  contrast, 
many  observed  dyadic  datasets  exhibit  certain  forms  of  third-order  dependence,  and  often  it  is  of 
scientific  interest  to  quantify  these  higher  order  patterns. 

In  this  article  we  propose  a  class  of  generalized  additive  models  based  on  the  social  relations 
model,  but  incorporate  third  order  dependence  via  a  bilinear  effect.  The  bilinear  effect  for  a  pair 
{i,j)  is  simply  the  inner  product  of  unobserved  characteristic  vectors  Zi  and  zj,  specific  to  units  i 
and  j  respectively.  This  approach  is  similar  in  spirit  to  the  latent  variable  methods  proposed  by 
Hoff,  Raftery,  and  Handcock  (2002)  to  capture  transitivity  in  a  social  network  dataset,  but  has 
some  computational  and  conceptual  advantages.  The  bilinear  effect  is  also  a  type  of  multiplicative 
interaction  (Gabriel  1978;  Marasinghe  and  Johnson  1982;  Oman  1991).  The  models  presented  in 
this  article  are  similar  to  the  generalized  bilinear  regression  models  studied  by  Gabriel  (1998),  who 
considered  approximate  maximum  likelihood  estimation  in  the  context  of  factorial  designs.  In  this 
article,  we  show  how  a  bilinear  effect  can  be  used  to  represent  certain  forms  of  dependence  often 
seen  in  dyadic  data,  and  develop  a  Markov  chain  Monte  Garlo  algorithm  based  on  Gibbs  sampling, 
providing  arbitrarily  exact  Bayesian  inference.  With  some  modifications,  the  algorithm  can  be  used 
as  a  means  of  making  Bayesian  inference  for  a  broad  class  of  generalized  bilinear  regression  models 
with  mixed  effects. 

In  the  next  section,  we  discuss  the  basic  linear  mixed  effects  model  for  dyadic  data  and  the 
resulting  dependence  structure.  In  Section  3,  we  discuss  types  of  third-order  dependence  often  seen 
in  network  datasets  and  the  use  of  a  bilinear  effect  to  capture  such  dependence.  Section  4  gives 
a  Markov  chain  Monte  Garlo  (MGMG)  algorithm  which  can  be  used  to  obtain  samples  from  the 
posterior  distribution  of  the  parameters.  Issues  such  as  model  fit,  model  selection  and  interpretation 
are  discussed  in  the  context  of  a  data  analysis  on  international  relations  in  Section  5.  A  discussion 
follows  in  Section  6. 

2  Linear  Mixed  Effects  Models  for  Exchangeable  Dyadic  Data 

Suppose  we  are  only  interested  in  estimating  the  linear  relationships  between  responses  yij  and  a 
possibly  vector  valued  set  of  variables  Xij,  which  could  include  characteristics  of  unit  i,  character¬ 
istics  of  unit  j,  or  characteristics  specific  to  the  pair.  In  this  case  we  might  consider  the  regression 
model 

yi,j  —  l3  Xij  J-  Cij,  (1) 

where  yi^i  is  typically  not  defined.  The  generalized  least  squares  estimate  (3  and  its  covariance 
matrix  depend  on  the  joint  distribution  of  the  Cij’s  only  through  their  covariance.  It  is  often 
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assumed  in  regression  problems  that  the  regressors  Xjj-  contain  enough  information  so  that  the 
distribution  of  the  errors  is  invariant  under  permutations  of  the  unit  labels.  This  assumption  is 
equivalent  to  the  n  x  n  matrix  of  errors  (with  an  undefined  diagonal)  having  a  distribution  that  is 
invariant  under  identical  row  and  column  permutations,  so  that  {eij  ■  i  ^  j}  is  equal  in  distribution 
to  {£77(1), ttQ)  ■  i  j}  for  any  permutation  vr  of  {1, . . .  ,n}.  This  condition  is  called  weak  row-and- 
column  exchangeability  of  an  array.  For  undirected  data,  such  exchangeability  implies  a  “random 
effects”  representation  of  the  errors,  in  that  ejj  is  equal  in  distribution  to  /(^, Oj, 7ij)  where 
are  independent  random  variables  and  /  is  a  function  to  be  specified  (Aldous  1985, 
Theorem  14.11).  If  in  addition  to  the  above  invariance  assumption  we  also  model  the  errors  as 
Gaussian,  then  the  joint  distribution  can  be  represented  in  terms  of  a  linear  random  effects  model. 
In  the  more  general  case  of  directed  observations,  we  can  represent  the  joint  distribution  of  the 
Cij’s  as  follows: 


€i^j  —  U7  T  bj  T  (^) 

{ai,bi)'  ~  multivariate  normal(0,  T,ab  = 

multivariate  normal(0,  S^),  = 

with  effects  otherwise  being  independent.  The  covariance  structure  of  the  errors  (and  thus  the 
observations)  is  as  follows: 


^ab 

^ab 

2 

2 

2 

2 

=^a  +  +  0-^  +  0-2  E{eijei^k)  =  CFa 

E{eij€j^i)  =  pa'^  +  2aab  E{eijekj)  =  al 

E{€ij€k^l^  —  0  E(^6ij€lc^i)  —  O'ab 

and  so  a'^  represents  the  dependence  of  observations  having  a  common  sender,  that  of  obser¬ 
vations  having  a  common  receiver,  and  p  represents  the  correlation  of  observations  within  a  dyad 
(often  interpreted  as  “mutuality”  or  “reciprocity”).  This  has  been  called  the  “social  relations”  or 
“round  robin”  model  (Warner  et  al.  1979;  Wong  1982),  and  is  related  to  a  model  for  diallel  cross 

data  used  by  Cockerham  and  Weir  (1977).  The  model  is  a  special  case  of  a  linear  group  symmetry 

model  (Andersson  and  Madsen,  1998),  and  has  been  studied  in  this  context  by  Li  (2002).  Recent 
advances  in  variance  component  estimation  have  been  made  by  Gill  and  Swartz  (2001)  and  Li  and 
Loken  (2002). 

To  analyze  responses  in  particular  sample  spaces,  the  error  structure  described  above  can  be 
added  to  a  linear  predictor  in  a  generalized  linear  model: 


^i,j 

E{yij\6ij) 

P{yi,2  ■  ■  ■  1  ?/n,n— l|^l,2  ■  ■  ■  ■>  ^n,n— l) 


P'xiJ  -\-  (li  +  bj  +  'Jij 

i¥=j 


(3) 
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This  is  a  generalized  linear  mixed-effects  model  with  inverse-link  function  g{6),  in  which  the  obser¬ 
vations  are  modeled  as  conditionally  independent  given  the  random  effects,  but  are  unconditionally 
dependent.  The  covariance  pattern  for  the  observations  is  given  approximately  as 

=  E[Cov(yijj^,yi2,i2l^h,ii>^i2j2)]  +  Cov[E(yij  jJ,  £’(yi2,i2l^*2j2)] 

=  ^[O]  +  Cov  [5(0*1  jJ,5((6»i2j2)] 

^  Eov(0*i  ,  0*2,J2)  ^  9  (/^  (/^  ^^2,72)5 

where  the  pattern  for  Cov(0ji  jj,  0j2,i2)  the  same  as  that  for  the  Cij’s  given  above.  However,  unlike 
the  linear  regression  case,  (3  is  not  given  by  linear  combinations  of  the  observations,  and  E{f3)  and 
Cov(/3)  are  not  functions  of  only  the  first  and  second  order  moments  of  the  data.  Model  lack  of  fit, 
or  third  and  higher  order  dependence,  will  affect  our  inference  on  (3.  Many  dyadic  datasets  exhibit 
certain  forms  of  third  order  dependence.  Indeed,  it  is  these  higher  order  patterns  of  dependence 
that  are  often  of  interest,  and  may  also  provide  information  useful  for  predictive  inference. 

3  Modeling  Third  Order  Dependence  Patterns 

Some  dependence  patterns  commonly  seen  in  dydaic  datasets  have  been  given  the  descriptive  titles 
of  transitivity,  balance,  and  clusterability.  In  the  context  of  binary  data,  graph  theoretic  definitions 
of  these  concepts  appear  in  Wasserman  and  Faust  (1994,  chapter  6)  and  are  as  follows: 

Transitivity:  For  directed  binary  data,  an  ordered  triad  i,  j,  k  is  transitive  if  whenever  yij  =  1 
and  yj^k  =  1,  we  have  yi^k  =  1;  be.  “a  friend  of  a  friend  is  a  friend.” 

Balance:  For  signed  unordered  relations,  a  triad  i,j,  k  is  said  to  be  balanced  if  x  yj^k  x  yk,i  >  0. 
The  idea  is  that  if  the  relationship  between  i  and  j  is  “positive”  then  they  will  relate  to 
another  unit  k  in  an  identical  fashion,  so  that  if  yij  >  0  then  ^  and  y^^i  are  either  both 
positive  or  both  negative. 

Clusterability:  This  is  a  relaxation  of  the  concept  of  balance.  A  triad  is  clusterable  if  it  is  balanced 
or  the  relations  are  all  negative.  The  idea  is  that  a  clusterable  triad  can  be  divided  into  groups 
where  the  measurements  are  positive  within  groups  and  negative  between  groups. 

In  a  statistical  sense,  a  dataset  will  display  varying  degrees  of  transitivity,  balance,  or  clusterability. 
Often  it  is  found  that  there  are  more  transitive,  balanced,  or  clusterable  triads  than  would  be 
expected  under  models  (2)  or  (3).  Another  indication  of  third  order  dependence  would  be  if  after 
fitting  a  regression  model  and  obtaining  the  residuals  ejj,  the  average  value  of  Cjj  x  ej^k  x  is 
substantially  larger  than  zero,  the  expected  value  presumed  by  model  (2). 

Hoff  et  al.  (2002)  used  simple  functions  of  latent  characteristic  vectors  in  a  fixed  effects  setting 
to  capture  some  forms  of  transitivity,  balance,  and  clusterability.  For  example,  they  considered 
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models  in  which  =  (i'xi^j  +  f{zi,Zj)  where  f{zi,Zj)  =  —\zi  —  Zj\  (“the  distance  model”)  or 
f{zi,Zj)  =  z'^Zj/lzjl  (“the  projection  model”).  In  what  follows,  we  consider  a  similar  approach 
using  the  inner  product  kernel  f{zi,  Zj)  =  z'^zj,  and  give  random  and  fixed  effects  interpretations. 

Adding  the  bilinear  effect  z'^zj  to  the  linear  random  effects  in  models  (2)  and  (3)  gives 

^i,j  =  +  bj  +  'Jij  +  (4) 

where  the  random  effects  ai,bj  and  'jij  are  modeled  with  the  multivariate  normal  distributions 
described  above.  We  have  written  =  z'^Zj  to  suggest  the  interpretation  of  z'^Zj  as  a  mean-zero 
random  effect:  If  the  z’s  are  modeled  as  independent  /c-dimensional  multivariate  normal  random 
vectors  with  mean  zero  and  covariance  matrix  T,z,  then  the  resulting  distribution  for  the  .^’s  has 
the  following  moment  properties: 

•  E(?ij)  =  0; 

•  =  trace  T.I; 

•  =  trace 

with  all  other  second  and  third  order  moments  equal  to  zero.  Note  that  an  orthogonal  transfor¬ 
mation  of  the  z’s  leaves  z^Zi  invariant,  so  we  can  assume  T,z  is  a  diagonal  matrix  (otherwise,  the 
off-diagonal  terms  are  non-identifiable) .  For  simplicity  we  focus  on  the  case  =  a'^I^xk,  for  which 
the  above  moments  are  0,  /ca^,  and  ka^  respectively.  With  added  to  the  error  term,  the  nonzero 
second  and  third  order  moments  are 

+  (Tl  +  a‘^  +  ka*  E{eij€i^k)  =  crl 
=  pa'^  +  2aah  +  ka^  E{eijek,j)  =  al 

E{^(iip€jk^k,i)  z  Ej(^€ij€k^i) 

Thus  the  effect  =  z'^zj  can  be  interpreted  as  a  mean-zero  random  effect  able  to  induce  a  particular 
form  of  third-order  dependence  often  found  in  dyadic  datasets.  Marginally,  as  k  increases  the 
distribution  of  j-  will  converge  to  a  normal  distribution,  due  to  the  central  limit  theorem.  Jointly, 
the  Markov  dependence  graph  for  the  ^’s  has  two  dyads  as  neighbors  if  they  have  at  least  one  unit 
in  common. 

Considered  as  fixed  effects,  the  ^’s  can  be  viewed  as  interaction  terms  that  are  highly  constrained 
due  to  the  functional  dependence  on  the  z’s.  The  constraint  is  easy  to  visualize  in  terms  of  the 
z’s:  If  Zi  and  Zj  are  vectors  of  similar  direction  and  magnitude,  then  z[zk  and  z'^Zk  will  not  be  too 
different.  This  feature  can  be  related  to  transitivity,  which  is  conceptually  a  measure  of  how  ^i^k 
is  a  function  of  and  ^j^k-  Considering  for  the  moment  z’s  scaled  to  have  unit  length  so  that 
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\zi  —  Zj  \  =  yj2{l  —  z[zj),  by  the  triangle  inequality  we  have 

1  -  z[zk  <  1  -  z[zj  +  1  -  z'jZk  +  2^ {I  -  z[zj){l  -  z'jZk),  or 

^i,k  ^  ^i,j  +  Cj,k  ~  1  +  2,'\J (1  —  ~  ^j,k)  ; 

which  gives  a  lower  bound  for  ^i^k  in  terms  of  and  ^j^k- 

Balance  and  clusterability  describe  how  similar  ^i^k  and  ^j^k  are  as  a  function  of  ^ij.  For  scaled 
z’s,  we  have 

\^i,k  ^j,k\  ~  \Zk{Zi  Zj)\  <  \Zk\  X  \Zi  Zj\  =  \Zi  Zj\. 

Noting  that  z'^zj  =  cos{(j)i  —  (pj),  where  (pi  is  the  angle  of  Zi  from  a  fixed  axis,  we  have 

\Zi  -  Zj 


and  so  \^i^k  —  ^j,k\  <  y^2(1  —  Ci,j)-  If  ^i,j  is  large,  the  difference  between  ^i^k  and  ^j^k  must  be  small. 
If  is  negative  one,  the  difference  is  unconstrained  and  could  range  from  zero  to  a  maximum  of 
two  (in  this  scaled  case). 

4  Parameter  Estimation 

In  the  frequentist  setting,  approximate  estimation  for  generalized  linear  mixed  effects  models  often 
proceeds  via  Taylor  expansions  and  iteratively  reweighted  least  squares  for  the  fixed  effects,  along 
with  approximate  restricted  maximum  likelihood  estimation  for  the  variance  components  (Schall 
1991;  Breslow  and  Clayton  1993;  Wolfinger  and  O’Connell  1993;  McGilchrist  1994).  The  accuracy 
of  these  approximate  methods  is  generally  dependent  on  the  sample  size,  see  Booth  and  Robert 
(1998)  for  a  discussion.  Gabriel  (1998)  suggests  an  algorithm  along  these  lines  for  the  generalized 
bilinear  mixed  effects  model.  Alternatively,  Zeger  and  Karim  (1991),  Gelfand,  Sahu  and  Carlin 
(1996),  and  Natarajan  and  Kass  (2000)  have  proposed  Gibbs  sampling  approaches  to  parameter 
estimation  for  generalized  linear  mixed  effects  models.  However,  estimation  is  more  difficult  for  the 
complicated  dependence  structure  of  the  random  effects  in  the  invariant  normal  model  (2).  Gill 
and  Swartz  (2001)  have  proposed  a  Gibbs  sampling  scheme  for  estimation  of  random  effects  in  the 
linear  case  with  the  identity  link,  although  we  have  found  that  their  algorithm  does  not  mix  well 
when  covariates  are  included,  due  to  a  weak  identifiability  of  the  unit  level  random  effects  and 
certain  regression  coefficients:  As  discussed  in  Gelfand,  Sahu,  and  Carlin  (1995)  the  random  effects 
a  and  b  will  be  confounded  to  a  degree  with  each  other  and  to  regression  parameters  associated  with 
predictors  that  do  not  vary  across  receivers  (i.e.  sender-specific  effects)  or  across  senders  (receiver- 
specific  effects).  For  example,  a  population-level  intercept  is  one  such  parameter.  To  obtain  a 
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“cleaner”  partition  of  the  variance  and  a  more  efficient  MCMC  sampling  scheme,  we  decompose 
Xij  into  Xij  =  ixd,ij,Xs^i,Xrj),  i.e.  into  dyad  specific  regressors  Xd,ij,  sender  specific  regressors  Xg^i 
and  receiver  specific  regressors  Xrj-  The  generalized  bilinear  model  is  then  rewritten  as 

^i,j  —  f^d^d,i,j  T  T  T  {Pr^r,j  T  bj')  -|-  'JiJ  T  Zj^Zj 

or  equivalently 


^i,j  ~  Pd^d,i,j  +  Si  +  Xj  +  7ij  +  z'j^Zj 

Si  —  T 

Ti  —  [3^ X’p^i  -\~  hi- 


This  parameterization  for  the  linear  unit-level  effects  is  similar  to  the  “centered”  parameterizations 
suggested  by  Gelfand  et  al.  (1995,  1996).  Note  that  an  intercept  can  be  thought  of  as  both  a  sender 
or  receiver  specific  effect.  For  symmetry,  we  include  the  constant  1/2  at  the  beginning  of  each  Xs,i 
and  Xrj  vector,  and  estimate  the  first  components  of  j3s  and  Pr  as  being  equal. 

Using  the  above  reparameterization  for  6ij,  we  estimate  the  parameters  for  the  generalized 
bilinear  regression  model  by  constructing  a  Markov  chain  in  {Pd,  Ps,  Pr,^ab,  Z,a‘^,Tiry}  (where  Z 
denotes  the  k  x  n  matrix  of  latent  vectors),  having  p{Pd,  Ps,  Pr,Ziab,  Z,a‘^,T,pY)  as  the  invariant 
distribution.  This  is  obtained  via  an  algorithm  based  on  Gibbs  sampling,  which  also  samples  s,  r 
and  the  0’s.  The  basic  algorithm  is  to  iterate  the  following  steps: 

1.  Sample  linear  effects: 

(a)  Sample  Pd,  s,  r\Ps,  Pr,  'Zab,  9,  Z  (linear  regression); 

(b)  Sample  Ps,  Pr\s,r,T,ab  (linear  regression); 

(c)  Sample  and  from  their  full  conditionals. 

2.  Sample  bilinear  effects: 


(a)  For  i  =  1, . . . ,  n:  sample  Zi\{zj,j  /  i},9,  P,  s,  r,  (a  linear  regression); 

(b)  Sample  from  its  full  conditional. 


3.  Sample  dyad  specific  parameters:  Update  {di,j,9j^i}  using  a  Metropolis-Hastings  step: 


(a)  Propose  (  p 

(b)  Accept  (  ) 


with  probability  1- 


Various  combinations  of  the  above  steps  can  be  used  to  estimate  different  models.  The  steps  in  1 
alone  provide  a  Bayesian  estimation  procedure  for  the  linear  regression  problem  having  an  error 
covariance  as  in  (2).  Bayesian  estimation  of  the  normal  bilinear  model  with  the  identity  link  could 
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proceed  by  replacing  each  Oi^j  with  and  only  iterating  steps  1  and  2.  Estimation  of  a  generalized 
linear  mixed  effects  model  with  random  effects  structure  given  by  (2)  could  proceed  by  iterating 
steps  1  and  3.  The  full  conditional  distributions  required  to  perform  steps  1  and  2  are  given  below. 

Note  that  the  0’s  are  essentially  unrestricted  in  the  above  sampling  scheme.  At  this  level  the 
fit  is  saturated  and  does  not  depend  on  the  regressors,  at  least  to  the  degree  that  the  prior  for 
is  diffuse.  What  the  MCMC  algorithm  above  provides  is  essentially  a  saturated  fit  for  the  0’s 
(although  somewhat  smoothed  by  the  common  variance)  and  an  AN OVA- like  decomposition  of  the 
0’s  into  regressor,  sender,  receiver  and  inner-product  effects. 


4.1  Conditional  Distributions  for  the  Linear  Effects  Components: 

Noting  that  9ij  —  z[zj  =  -|-  Sj  -|-  rj  -|-  7*^,  we  see  that  conditional  on  the  0’s  and  z’s,  the 

other  parameters  can  be  sampled  using  a  standard  Bayesian  normal-theory  regression  approach, 
although  with  a  complicated  covariance  structure. 


Pull  conditional  of  {(3d,s,r) 
we  let  Uij  =  6ij  +  Oj^i  —  ‘lz\zj 


:  Similar  to  Wong’s  (1982)  approach  to  the  invariant  normal  model, 
and  Uij  =  0ij'  —  0j7  for  i  <  j.  We  then  have 

/  \  /  \  (  Qrl  \ 


U 

V 


Xu 

Xu 


s 

V  "  J 


+ 


Su 

Sv 


(5) 


where  Xu  and  Xu  are  the  appropriate  design  matrices  and  5u  and  Su  are  vectors  of  independent 
error  terms  with  variances  =  2a^(l  +  p)  and  =  2it^(1  —  p)  respectively.  The  full  conditional 
distribution  of  {f5d-,s,r)  is  then  proportional  to  p(u,  u|/3rf,  s,  r,  S^)  x  p(s,  r|/3s, /3r,  E^fe)  x  p{(5d)-  For 
a  multivariate  normal  prior  distribution  on  f5d,  the  term  in  the  exponent  of  the  full 

conditional  is 


■  X'uUlal  +  X'^v/al 


T-l 

■"/3d 

0 


0 


+  X'^Xu/al  +  X',Xu/al 


where  (j)'  =  {(3'^  s'  r'),  X^r  and  f3sr  are  the  combined  design  matrix  and  regression  parameters  for  s 
and  r,  and  is  the  covariance  matrix  of  {s'  r')' ,  which  is  easily  derived  from  T,ab-  The  conditional 
distribution  is  thus  multivariate  normal  {p,  E)  where 


E 


E 


+  X'uU/al  -h  X'^v/al 
+  X'^Xu/al  +  X',Xu/al 


Note  that  the  inverse  of  E*^  is  given  by 


E 


-1 

sr 


(*^f) {^ab/X)Inxn 
{^ab/X)Inxn  {^alX)Inxn 


A  —  ^2^2  2 

X  ^a^b  ^ab‘ 


Pull  conditional  of  {(3s,  Pr)-  The  full  conditional  of  {Ps,Pr)  is  proportional  to  p{s,  r\Ps,Pr,  ^ab)  x 
p{Ps,Pr)-  Assuming  a  multivariate  normal  {pp^^  >  ^/3sr)  prior  distribution  for  the  combined  regression 
parameters,  the  full  conditional  is  a  multivariate  normal  distribution  with  mean  and  variance  {p,  S) 
given  by 


p  =  T, 


+  ^sr'^ 


-1 

sr 


/  S^  —  1 


Pull  conditional  of  T,ab-  The  full  conditional  of  is  proportional  to  p{s,  r\Ps,  Pr,  ^ab)p{'^ab)- 
Using  a  prior  distribution  of  ~  inverse  Wishart(SabO)  (parameterized  so  that  EiT^ah)  = 
^abo/{t^  —  3)),  the  full  conditional  of  Sab  is  Sab|a, &  ~  inverse  Wishart(Sabo  +  {a  h)'{a  h),v  +  n), 
where  a  =  {s  —  XgPs)  and  b  =  {r  —  XrPr)- 


Pull  conditional  of  Using  prior  distributions  of  ~  inverse  gamma(atji,  0^(2)  and  ~ 
inverse  gamma(aai,  Q;„2),  the  full  conditionals  are  given  by  cJa|rt  ~  inverse  gamma(Q;t,i  +  ^  (2) ,  0^2  + 

\  andiTaIr’  ~  inverse  gamma(aai+^ (2) >  Y^[vi-Vi,j\'^),  where  ftjj  =  E[uij\Pd,Xij,Si,rj\ 

T  +  'Si  +  Sj  +  Ti  +  rj,  and  Vij  is  given  similarly.  The  covariance  matrix  can  be  re¬ 
constructed  from  and  via  =  (cr^  -|-  (T^)/4  and  p  =  {a^  —  (yi')l{(y\  +  fr^)- 

4.2  Conditional  distributions  for  the  Bilinear  Effects  Component: 

Let  ejj  =  {9ij  +  Oj^i  —  Uij)l2,  the  residual  of  the  symmetric  part  of  the  matrix  of  0’s  after  fitting 
the  linear  effects,  and  let  Su,i,j  =  'Jij  +  7j,i-  Considering  the  full  conditional  of  Zi,  we  have 

Si, I  = 

ei,2  = 

Ci,n  — 


ZiZi  -|-  (in, i, 1/2 
z'iZ2  +  (in, i, 2/2 


Zi^n  T 


and  we  see  that  sampling  Zi  from  its  full  conditional  is  equivalent  to  a  (Bayesian)  linear  regression 
problem.  Modeling  the  z’s  as  a  priori  independent  multivariate  normal  (0,  variables,  the  full 
conditional  of  Zi  is  multivariate  normal  (/U,  S)  with 

p  =  ATj  Z^id-i/al 

S  =  (S;1  +  4Z_,ZU/u2)-i 

where  denotes  the  A:  x  (n  —  1)  matrix  obtained  by  removing  the  Ah  column  of  Z,  and  ei-i 
denotes  the  vector  of  residuals  {eij  :  j  /  i}.  Using  an  inverse-Wishart(S20,  prior,  the  full 
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conditional  of  is  inverse-Wishart(S20  +  ZZ' +  n).  Alternatively,  if  we  restrict  to  be 
and  use  an  inverse  gamma(ao)  Oii)  prior,  then  the  full  conditional  is  given  by  it^|Z  ~  inverse 
gamma(ao  +  {nk)/2,ai  +  trace(Z'Z)/2). 

5  Data  Analysis:  International  Relations  in  Central  Asia 

We  analyze  data  on  international  relations  in  central  Asia  as  recorded  by  the  Kansas  Event 
Data  Project  (http://www.ku.edu/~keds/project.html)  and  described  by  Schrodt,  Simpson, 
and  Gerner  (2001).  News  stories  are  downloaded  from  the  Reuters  Business  Briefing  Service  on 
Afghanistan,  Armenia,  Azerbaijan,  and  the  former  Soviet  Republics  of  Central  Asia,  and  political 
interactions  between  countries  are  recorded  and  categorized.  We  take  our  response  t/ij  to  be  the 
total  number  of  “positive”  actions  reportedly  initiated  by  country  i  with  target  j  from  1992  to  1999 
(i.e.  after  the  breakup  of  the  Soviet  Union),  as  recorded  by  the  KEDS  project.  Positive  actions 
here  include  such  events  as  approval,  endorsement  or  praise  of  one  government  by  another,  military 
assistance,  formation  of  alliances,  promises  of  financial  or  policy  support  and  others  (essentially  all 
events  having  Goldstein  scale  values  greater  than  2.5,  except  cease-fire  or  ceding  of  power.  See  the 
KEDS  project  webpage  for  more  details).  We  include  in  our  population  the  99  countries  closest  in 
geographic  distance  to  Afghanistan,  plus  the  United  States,  giving  a  total  of  n  =  100  countries  for 
analysis.  We  note  that  seventeen  of  the  one- hundred  countries  had  zero  actions  as  either  initiators 
or  targets  of  actions  over  the  seven  year  period. 

5.1  Data  Description 

Some  descriptive  plots  of  the  raw  data  are  given  in  Eigure  1.  Panel  (a)  plots  log(l  -|- 
versus  log(l -|- yj,i)  for  each  country  i.  The  quantities  are  typically 

called  the  outdegree  and  indegree  of  unit  i,  respectively.  Note  the  strong  correlation,  which  suggests 
a  large  value  of  a  ah  j  a^^h)  in  the  random  effects  model  being  considered.  In  panel  (b)  we  plot  the 
log  of  each  country’s  outdegree  plus  one,  log(l  -|-  versus  log  population,  which  suggests 

a  positive  relationship  between  response  and  population  (a  plot  of  log-indegree  versus  population  is 
similar) .  In  panel  (c)  we  plot  the  response  on  a  log  scale  versus  the  geographic  distance  in  thousands 
of  miles  between  countries  i  and  j.  More  precisely,  this  distance  is  the  “minimum  distance”  between 
two  countries,  and  is  zero  if  i  and  j  share  a  border.  On  average,  the  number  of  events  between  two 
countries  decreases  as  geographic  distance  increases.  This  pattern  is  made  more  clear  by  separating 
out  the  measurements  involving  the  United  States  (which  are  circled). 

5.2  Model  and  Priors 

Note  that  the  data  are  from  an  observational  study,  and  that  the  data  are  not  randomly  sampled. 
Rather,  we  have  defined  a  population  of  units  based  on  geographic  distance  and  have  measurements 
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log(indegree+1)  log(population)  geographic  distance 

Figure  1:  Relationships  between  (a)  Outdegree  and  indegree;  (b)  Outdegree  and  population;  (c) 
Response  and  geographic  distance.  Responses  involving  the  United  States  are  circled. 

on  all  pairs.  For  this  analysis,  we  primarily  interpret  a  probability  model  as  a  tool  for  describing 
the  variance  in  the  dataset,  and  the  regression  coefficients  as  measures  of  the  multiplicative,  or 
log-linear,  components  of  the  relationship  between  response  and  regressors. 

We  fit  the  random  effects  model  (4)  to  the  data  using  a  Poisson  distribution  and  the  log-link, 
so  that  each  response  Uij  is  assumed  to  have  come  from  a  Poisson  distribution  with  mean  ,  and 
that  the  y’s  are  conditionally  independent  given  the  0’s.  We  decompose  the  variance  in  the  0’s  as 
follows: 

0i,j  —  /3o  4“  -|-  PsXi  -|-  (3rXj  -|-  €ij 

£i,j  =  fl*  +  bj  -|-  7j  j  -|-  z'iZj , 

where  Xij  is  the  geographic  distance  between  i  and  j  and  Xi  is  the  log  population  of  i.  For  estimation 
of  variance  components,  we  model  the  random  effects  as  having  the  following  multivariate  normal 
distributions:  (a*,  6*)'  ~  MVN(0,  ~  MVN(0,S^),  Zj  ~  MVN(0, cr^/fcxfc)-  Prior 

distributions  of  the  parameters  are  taken  to  be 

•  (3  ^  multivariate  normal(0, 100  x  I4X4); 

•  Safe  ~  inverse  Wishart(/2x25 4); 

•  ~  i-i-d.  inverse  gamma(l,  1),  =  {al  -h  p={o\-  -h  <t^). 

Posterior  calculations  proceed  as  described  in  Section  4. 
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k 

LLP(/c) 

logp{y\$,a,b,Z,t^) 

AIC 

0 

-3558.78 

-2432.54 

-2638.54 

1 

-3351.76 

-2316.56 

-2622.56 

2 

-3078.79 

-2214.68 

-2620.68 

3 

-3076.73 

-2123.49 

-2629.49 

4 

-3077.30 

-2038.05 

-2644.05 

Table  1:  Selection  of  k 

5.3  Selecting  the  Latent  Dimension: 

One  issue  in  model  fitting  is  the  selection  of  the  dimension  k  of  the  latent  variables  z.  Selection  of  k 
could  depend  on  the  goal  of  the  analysis.  For  example,  if  the  goal  is  descriptive,  i.e.  the  desired  end 
result  is  a  decomposition  of  the  variance  into  interpretable  components,  then  a  choice  of  /c  =  1,2 
or  3  would  allow  for  a  simple  graphical  presentation  of  a  multiplicative  component  of  the  variance. 
Alternatively,  one  could  examine  model  fit  as  a  function  of  k  based  on  the  log-likelihood,  or  use  a 
cross-validation  criterion  if  one  is  primarily  concerned  with  predictive  performance. 

Considering  likelihood-based  measures  of  fit,  the  log-probability  of  the  data  given  the  values  of 
the  parameters  gets  evaluated  for  each  update  of  the  0’s,  and  so  logp(y|0)  =  can 

be  calculated  with  no  extra  effort.  However,  such  a  quantity  is  not  appropriate  for  selecting  between 
models.  As  described  in  Section  4,  the  model  is  essentially  unrestricted  in  the  0’s,  giving  a  nearly 
saturated  fit  which  does  not  depend  much  on  the  choice  of  k  or  the  regressors  (provided  the  prior 
for  T,^  is  sufficiently  diffuse).  A  likelihood  that  is  more  appropriate  is  the  marginal  probability  of 
data  within  a  pair,  logp(Y\f3,a,b,  Z,T.^)  =  J2{i,j)^ogp{yij,yj^i\f3,ai,bj,aj,bi,Zi,Zj,T.^),  where  the 
sum  is  over  unordered  pairs.  This  is  essentially  the  log-likelihood  treating  the  a,  b,  and  z’s  as  fixed 
effects.  Note  that  in  general  logp{yij,yj^i\(5,ai,bj,aj,bi,  Zi,  Zj,T,^)  is  an  integral  over  and 
that  needs  to  be  approximated,  except  in  the  case  of  the  normal  model  with  the  identity  link. 

In  some  situations  the  purpose  of  the  model  is  to  make  predictions  of  unobserved  data.  For 
example,  suppose  only  a  subset  of  the  n{n  —  1)  responses  were  randomly  chosen  to  be  measured. 
As  long  as  we  have  some  measurements  for  each  unit,  we  can  estimate  the  effects  a,  b  and  z  for  each 
unit  and  make  predictions  for  missing  responses  based  on  these  estimates.  Although  prediction  is 
not  the  goal  for  these  data,  for  illustrative  purposes  we  compare  the  marginal  probability  criterion 
discussed  above  to  the  following  four-fold  cross  validation  procedure: 

1.  Randomly  split  the  set  of  ordered  pairs  {i,j  ■  i  ^  j}  into  four  test  sets  Ai,  A2,  A^,  A4. 


2.  For  A:  =  0,1, 2,3,4  : 
(a)  For  Z  =  1, 2, 3, 4  : 
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i.  perform  the  MCMC  algorithm  using  only  {t/ij  :  {i,j}  0  Ai},  but  sample  values  of 
6ij  for  all  ordered  pairs. 

ii.  Based  on  the  sampled  values  of  6ij  compute  the  posterior  mean  6ij  for  {i,j}  G  Ai 

and  the  log  predictive  probability  Ipp(^i)  =  ^ogp{yi,j\0ij). 

(b)  Measure  the  predictive  performance  for  k  as  LPP(A:)  =  Ipp(^z). 

3.  Select  k  based  on  LPP(A;). 

For  these  data,  the  marginal  likelihood  and  cross-validation  criteria  for  selecting  k  are  given 
in  Table  2.  The  cross  validation  procedure  suggests  that  models  having  a  dimension  of  /c  =  2,3 
or  4  have  roughly  the  same  predictive  performance.  In  terms  of  the  marginal  likelihood  criterion, 
the  biggest  improvements  in  fit  are  in  going  from  k  =  0  to  k  =  1  and  from  k  =  1  to  k  =  2.  The 
improvements  in  fit  in  going  from  2  to  3  and  from  3  to  4  dimensions  are  smaller.  Using  an  AlC-like 
criterion  and  penalizing  the  improvement  in  likelihood  by  the  number  of  additional  parameters  (100 
per  additional  dimension),  we  would  choose  k  =  2.  Based  on  these  results  (and  our  ability  to  plot 
results  in  two-dimensions)  we  choose  to  present  the  results  for  the  k  =  2  model  in  more  detail. 


0  50000  100000  150000  200000  0  50000  100000  150000  200000 

iteration  iteration 


Figure  2:  Marginal  MCMC  output  for  regression  coefficients.  Solid  lines  are  from  the  Markov  chain 
with  data-informed  starting  values,  dashed  lines  from  the  chain  with  uninformed  starting  values. 


5.4  Results  for  k  =  2 

Two  Markov  chains  of  length  200,000  each  were  constructed  using  the  algorithm  described  above. 
The  first  chain  used  starting  values  of  zero  for  all  regression  coefficients  and  country-specific  in¬ 
tercepts,  the  identity  matrix  for  Y^ah  and  a  value  of  0.1  for  and  components  of  Z  sampled 
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Figure  3:  Marginal  MCMC  output  for  variance  component  parameters. 


Ps 

Pr 

^ab 

^7 

P 

mean 

-0.18 

1.00 

0.94 

6.46 

6.37 

6.4 

1.23 

0.95 

1.99 

sd 

0.04 

0.17 

0.17 

1.23 

1.2 

1.21 

0.14 

0.01 

0.27 
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Table  2:  Posterior  means  and  standard  deviations  for  k  =  2 

independently  from  a  normal  distribution.  The  second  chain  used  starting  values  obtained 

from  the  following  procedure:  Maximum  likelihood  estimates  of  f3d,  s  and  r  were  obtained  by  fit¬ 
ting  an  ordinary  generalized  linear  model  using  geographic  distance  as  a  regressor  and  sender  and 
receiver  labels  as  factor  variables.  Estimates  of  (5o,(5s,Pr,  and  T,ab  were  obtained  from  the  esti¬ 
mates  of  s  and  r.  The  iteratively  reweighted  least-squares  fitting  procedure  produces  a  matrix  R  of 
working  residuals,  with  the  off  diagonal  elements  undefined.  An  estimate  Z  of  Z  was  then  obtained 
by  approximating  R  with  a  matrix  product  of  the  form  Z'Z.  This  can  be  done  with  an  iterative 
least-squares  procedure,  similar  to  the  Gibbs  sampling  procedure  outlined  in  Section  4.2:  see  ten 
Berge  and  Kiers  (1989)  for  more  details  on  this  problem.  An  estimate  of  is  then  obtained  from 
R  -  Z'Z. 

Samples  of  parameter  values  were  saved  from  the  Markov  chains  every  100  iterations,  and  are 
plotted  in  Figures  2  and  3.  Both  chains  appear  to  have  achieved  stationarity  after  about  50,000 
iterations,  and  so  we  base  our  inference  on  the  saved  samples  after  this  point.  Posterior  means  and 
standard  deviations  of  the  model  parameters,  based  on  the  3000  saved  MCMC  samples  (1500  from 
each  chain),  are  given  in  Table  2.  As  in  the  raw  data,  we  see  a  negative  relation  between  response 
and  geographic  distance  {E[j3d\y\  =  —0.18),  and  a  positive  relation  between  response  and  country 
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populations  {E[f3s\y\  =  1.00, -E[/3r|y]  =  0.94).  We  also  estimate  a  strong  positive  correlation  of 
within-dyad  responses  as  well  as  the  within-country  random  effects  a  and  b. 


Figure  4:  Posterior  mean  of  Z 

Next,  we  analyze  the  posterior  distribution  of  the  the  k  x  n  matrix  of  latent  vectors  Z.  Note 
that  the  probability  model  depends  on  Z  only  through  the  matrix  of  inner  products  Z'Z,  which  is 
invariant  under  rotations  and  reflections  of  Z.  Therefore,  log Pr(y|Z, /?, X)  =  log Pr(y|Z*, /3, X) 
for  any  Z*  which  is  equivalent  to  Z  under  the  operations  of  rotation  or  reflection.  Values  of  Z 
sampled  from  the  posterior  distribution  may  seem  at  first  to  be  highly  variable,  but  perhaps  are 
nearly  rotations  of  each  other  and  are  thus  not  highly  variable  in  terms  of  the  resulting  inner 
product  matrices.  To  appropriately  compare  sample  values  of  Z,  we  must  first  rotate  them  to 
a  common  orientation.  For  these  data  this  is  done  using  a  “Procrustean”  transformation  (Sibson 
1978),  in  which  for  each  sample  Z  we  find  the  rotation  Z*  of  Z  that  has  the  smallest  sum  of  squared 
deviations  from  an  arbitrary  fixed  reference  matrix  Zq.  The  rotated  matrix  Z*  which  minimizes  the 
sum  of  squares  is  given  by  Z*  =  ZqZ' {Z ZqZoZ')~^/‘^ Z .  See  Hoff  et  al.  (2002)  for  further  discussion. 
The  resulting  mean  of  Z*  is  given  in  Figure  4.  Marginal  uncertainty  in  the  z’s  could  be  displayed 
by  plotting  sample  z*’s  over  the  plot  of  the  means,  using  colors  to  distinguish  between  countries. 

Generally,  two  countries  will  be  modeled  as  having  z’s  in  the  same  direction  if  they  have  large 
responses  to  one  another  relative  to  their  total  number  of  actions  and  covariate  values,  and/or  if 
their  responses  involving  other  countries  are  similar  (a  model  which  can  distinguish  between  these 
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two  phenomena  is  proposed  in  the  discussion).  For  example,  Croatia  and  Slovenia  are  each  recorded 
as  the  initiator  of  an  action  with  the  other  as  a  target,  and  each  initiates  an  action  with  Serbia  as 
well.  With  the  exception  of  one  action  from  Slovenia  to  Italy,  these  are  the  only  events  recorded 
for  Croatia  and  Slovenia,  and  so  these  countries  are  “similar”  in  that  they  have  actions  involving 
each  other  and  to  Serbia,  and  only  one  other  action  involving  another  country.  Bosnia-Herzegovina 
and  Denmark  have  no  actions  with  Croatia  or  Slovenia,  but  like  Croatia  and  Slovenia  they  each 
have  one  action  with  Serbia  and  very  few  actions  otherwise  (each  has  one  action  with  Azerbaijan, 
and  no  other  actions),  and  are  thus  located  in  a  similar  direction.  Serbia,  although  active  with 
this  group  of  countries  (on  the  scale  of  their  response  rates),  has  actions  with  10  other  countries, 
and  is  therefore  placed  more  towards  the  center.  Of  course,  the  posterior  variances  of  the  z’s  for 
Croatia,  Slovenia,  Bosnia-Herzegovina,  and  Denmark  are  quite  high,  as  our  information  about  them 
is  coming  primarily  from  the  few  nonzero  responses  among  them. 


(a) 


var[log(yij+1)] 


(b) 


Figure  5:  Goodness  of  fit  tests:  (a)  Posterior  predictive  distribution  of  population  variance,  (b) 
Posterior  predictive  confidence  regions  for  country-specific  variance  in  action  initiation. 

Finally,  we  evaluate  some  aspects  of  model  adequacy  via  goodness  of  fit  statistics.  This  is 
done  by  comparing  the  observed  value  of  a  statistic  of  interest  T{Y)  to  its  posterior  predictive 
distribution  p{T{Yp^e(j^)\Y).  Samples  from  the  posterior  predictive  distribution  are  obtained  by 
simulating  datasets  using  the  parameters  sampled  by  the  Markov  chain  (see,  for  example  Gelman, 
Carlin,  Stern  and  Rubin  1995  chapter  6). 

In  the  present  case  we  might  be  interested  in  any  over  or  under  dispersion  of  the  data  relative 
to  the  Poisson  model.  We  evaluate  any  such  lack  of  fit  by  considering  as  test  statistics  the  overall 
sample  variance  of  log(yi j-|-l),  as  well  as  the  sample  variance  of  {log(yi j-|-l)  :  j  /  i}  for  each  i,  that 
is,  the  variance  of  responses  from  each  sender,  on  a  log  scale.  The  posterior  predictive  distributions 
of  these  quantities  were  estimated  by  sub-sampling  1000  values  of  {fSa,  s,r,  Z,T,.y)  from  the  two 
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Markov  chains,  generating  a  dataset  from  each  sub-sampled  set  of  parameter  values,  and  then 
computing  the  statistics  from  each  generated  dataset.  The  results  are  plotted  in  Figure  5.  The  first 
panel  gives  a  histogram  of  1000  samples  from  the  posterior  predictive  distribution  of  the  overall 
variance.  The  posterior  predictive  distribution  is  centered  around  the  observed  overall  variance, 
given  by  the  vertical  line,  and  no  lack  of  fit  is  indicated  by  this  statistic.  The  second  panel  of  Figure 

5  plots  the  observed  sender-specific  variances  for  each  country  versus  a  95%  posterior  predictive 
interval  for  that  quantity.  The  confidence  intervals  contain  the  observed  values  for  97  of  the  100 
countries,  and  thus  do  not  indicate  much  lack-of-fit.  The  Poisson  model  seems  to  fit  the  variance 
in  response  reasonably  well,  at  least  in  terms  of  these  statistics. 

6  Discussion 

This  article  has  presented  an  approach  to  modeling  third  order  dependence  patterns  often  seen 
in  dyadic  datasets,  such  as  social  networks.  The  models  are  based  on  generalized  linear  mixed 
effects  models  with  the  addition  of  a  reduced-rank  interaction  term  composed  of  inner  products 
of  latent  characteristic  vectors.  Such  an  approach  allows  for  the  analysis  of  dyadic  data  using 
familiar  regression  tools,  but  also  allows  one  to  capture  patterns  such  as  transitivity,  balance,  and 
clusterability  which  are  often  of  interest  to  social  science  researchers.  Other  approaches  to  capturing 
such  dependence  patterns  have  used  metric  distances  (Hoff  et  al.  2002)  and  ultrametric  distances 
(Schweinberger  and  Snijders,  2003),  although  not  in  the  presence  of  the  covariance  structure  (2). 
While  such  latent  distance  models  may  be  easy  to  understand,  the  inner-product  approach  has 
some  conceptual  appeal,  as  the  term  z[zj  can  be  viewed  as  a  mean-zero  random  effect. 

Another  dependence  pattern  often  of  interest  to  researchers  is  that  of  stochastic  equivalence, 
in  which  two  units  i  and  j  are  said  to  be  stochastically  equivalent  if  their  responses  have  the 
same  probability  distribution,  i.e.  . . . ,  t/i^n)  =  p{yj,i,  ■  ■  ■ ,  yj,n)-  The  model  considered  in  this 

paper,  as  well  as  the  latent  distance  approaches  mentioned  above,  potentially  confound  stochastic 
equivalence  patterns  with  those  of  clusterability  and  balance:  two  units  will  generally  be  estimated 
to  have  similar  latent  characteristic  vectors  if  they  have  strong  relations  to  each  other,  or  have 
similar  relations  to  others  unit  units.  However,  in  some  datasets  there  may  be  clusters  of  units  that 
relate  similarly  to  others,  but  not  strongly  to  each  other.  Nowicki  and  Snijders  (2001)  considered 
a  latent  class  model  which  identified  clusters  of  such  stochastically  equivalent  units,  but  did  not 
separately  consider  clustering  based  on  strength  of  relations.  A  possible  approach  to  modeling  both 
types  of  patterns  is  to  extend  the  bilinear  effect  discussed  in  this  paper  to  a  more  general  asymmetric 
bilinear  effect  such  as  z[Rzj,  where  R  is  a  k  x  k  matrix.  Estimation  of  similar  types  of  effects  has 
been  considered  by  by  Gabriel  (1998),  and  least  squares  representations  of  an  asymmetric  matrix 
Y  by  Z'RZ  has  been  considered  by  ten  Berge  and  Kiers  (1989),  Kiers  (1989)  and  Trendafilov 
(2002),  among  others.  In  the  present  application,  the  vector  Zi  could  be  interpreted  as  giving 
grades  of  membership  for  unit  i  to  each  of  k  classes,  and  Rim  as  the  response  rate  from  class  I  to  m. 
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Interestingly,  the  restriction  of  each  Zi  to  be  unity  at  one  component  and  zero  at  the  others  gives 

a  representation  of  the  latent  class  model  of  Nowicki  and  Snijders  (2000).  Unrestricted  estimation 

of  z'^Rzj,  in  the  presence  of  the  error  structure  (2),  is  a  topic  of  current  research  by  the  author. 

The  data  analyzed  in  Section  5,  along  with  R-functions  for  implementing  the  proposed  methods, 

are  available  at  the  author’s  website  www.stat.washington.edu/hoff. 
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